Adapting K-Medians to Generate Normalized Cluster Centers

نویسندگان

  • Benjamin J. Anderson
  • Deborah S. Gross
  • David R. Musicant
  • Anna M. Ritz
  • Thomas G. Smith
  • Leah E. Steinberg
چکیده

Many applications of clustering require the use of normalized data, such as text or mass spectra mining. The spherical K-means algorithm [6], an adaptation of the traditional K-means algorithm, is highly useful for data of this kind because it produces normalized cluster centers. The K-medians clustering algorithm is also an important clustering tool because of its wellknown resistance to outliers. K-medians, however, is not trivially adapted to produce normalized cluster centers. We introduce a new algorithm (called MN), inspired by spherical K-means, that integrates with Kmedians clustering to produce locally optimal normalized cluster centers. We then show theoretically and experimentally that MN produces clusters of significantly higher quality than one would obtain via a simple scaling of the cluster centers produced from traditional K-medians.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Normalized Cluster Centers with k-Medians

Many applications of clustering require the use of normalized data, such as text or mass spectra mining. The spherical k-means algorithm [6], an adaptation of the traditional k-means algorithm, is highly useful for data of this kind because it produces normalized cluster centers. The k-medians clustering algorithm is also an important clustering tool because of its wellknown resistance to outli...

متن کامل

Greedy bi-criteria approximations for k-medians and k-means

This paper investigates the following natural greedy procedure for clustering in the bi-criterion setting: iteratively grow a set of centers, in each round adding the center from a candidate set that maximally decreases clustering cost. In the case of k-medians and k-means, the key results are as follows. • When the method considers all data points as candidate centers, then selecting O(k log(1...

متن کامل

A New Algorithm for Cluster Initialization

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the kmeans algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximu...

متن کامل

Normalized k-means clustering of hyper-rectangles

Interval variables can be measured on very different scales. We first remind a general methodology used for measuring the dispersion of a variable from an optimal center and we define two measures of dispersions associated to two optimal ”centers” for interval variables. Then we study the relations between the standardization of a data table and the use in clustering of a normalized distance. F...

متن کامل

Fast k-clustering Queries on Road Networks

In this article, we study the k-clustering query problem on road networks, an important problem in Geographic Information Systems. Using Euclidean embeddings and reduction to fast nearest neighbor search, we devise approximation algorithms for these problems. Since these problems are difficult to solve exactly – and even hard to approximate for most variants – we compare our constant factor app...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006